Text Parsing of a Complex Genre

نویسندگان

  • Harald Lüngen
  • Maja Bärenfänger
  • Mirco Hilbert
  • Henning Lobin
  • Csilla Puskás
چکیده

A text parsing component designed to be part of a system that assists students in academic reading an writing is presented. The parser can automatically add a relational discourse structure annotation to a scientific article that a user wants to explore. The discourse structure employed is defined in an XML format and is based the Rhetorical Structure Theory. The architecture of the parser comprises pre-processing components which provide an input text with XML annotations on different linguistic and structural layers. In the first version these are syntactic tagging, lexical discourse marker tagging, logical document structure, and segmentation into elementary discourse segments. The algorithm is based on the shift-reduce parser by Marcu (2000) and is controlled by reduce operations that are constrained by linguistic conditions derived from an XML-encoded discourse marker lexicon. The constraints are formulated over multiple annotation layers of the same text.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data point selection for genre-aware parsing

In the NLP literature, adapting a parser to new text with properties different from the training data is commonly referred to as domain adaptation. In practice, however, the differences between texts from different sources often reflect a mixture of domain and genre properties, and it is by no means clear what impact each of those has on statistical parsing. In this paper, we investigate how di...

متن کامل

Designing a Discourse Parser for the Evaluative Text Genre

We propose designing a discourse parser specifically for the evaluative text genre. We aim to see whether focusing on a certain genre and relations specific to that genre offers performance gain beyond more generic discourse parsers. In this extended abstract we describe the approach we intend to take, and how this differs from what has been done previously. The problem of discourse parsing It ...

متن کامل

Different Flavors of GUM: Evaluating Genre and Sentence Type Effects on Multilayer Corpus Annotation Quality

Genre and domain are well known covariates of both manual and automatic annotation quality. Comparatively less is known about the effect of sentence types, such as imperatives, questions or fragments, and how they interact with text type effects. Using mixed effects models, we evaluate the relative influence of genre and sentence types on automatic and manual annotation quality for three relate...

متن کامل

Genre in Semantic Networks: A study of the Lexicon of News Articles

Our project aims at understanding text genres within the domain of the news. Advances in computational methods and availability of digital corpora has ushered in a new age of empirically testing intuitions about genres and styles, in particular the automatic classification of a document to its genre. At the same time, identifying systematic patterns of difference between genres, both quantitati...

متن کامل

On the order of Words in Italian: a Study on Genre vs Complexity

In this paper we present a cross-genre study on word order variation in Italian based on automatically dependency– parsed corpora. A comparative analysis focused on dependency direction and dependency distance for major constituents in the sentence is carried out in order to assess the influence of both textual genre and linguistic complexity on the distribution of phenonemena of syntactic mark...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006